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O ■ ABSTRACT 

Primordial fluctuations in the cosmic density are usually assumed to take the form of a Gaus- 
sian random field that evolves under the action of gravitational instability. In the early stages, 
^ \ while they have low amplitude, the fluctuations grow linearly. During this phase the Gaus- 

i sian character of the fluctuations is preserved. Later on, when the fluctuations have amplitude 

of order the mean density or larger, non-linear effects cause departures from Gaussianity. In 
\ Fourier space, non-linearity is responsible for coupling Fourier modes and altering the initially 

random distribution of phases that characterizes Gaussian initial conditions. In this paper we 
investigate some of the effects of quadratic non-linearity on basic statistical properties of cos- 
. mological fluctuations. We demonstrate how this form of non-linearity can affect asymptotic 

ly-^ ' properties of density fields such as homogeneity, ergodicity, and behaviour under smoothing. 

Q>^ \ We also show how quadratic density fluctuations give rise to a particular relationship between 

■ the phases of different Fourier modes which, in turn, leads to the generation of a non-vanishing 

00 ' bispectrum. We thus elucidate the relationship between higher-order power spectra and phase 

! distributions. 
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1 INTRODUCTION 
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In most popular versions of the gravitational instability model for the origin of cosmic structure, particularly those involving cosmic inflation 
(Guth 1981; Guth & Pi 1982), the initial fluctuations that seeded the structure formation process form a Gaussian random field (Adler 1981; 
Bardeen et al. 1986). 

Gaussian random fields are the simplest fully-defined stochastic processes, which makes analysis of them relatively straightforward. 
. Robust and powerful statistical descriptors can be constructed that have a firm mathematical underpinning and are relatively simple to 
k>( , implement. Second-order statistics such as the ubiquitous power-spectrum (e.g. Peacock & Dodds 1996) furnish a complete description of 
j_j . Gaussian fields. They have consequently yielded invaluable insights into the behaviour of large-scale structure in the latest generation of 
' redshift surveys, such as the 2dFGRS (Percival et al. 2001). Important though these methods undoubtedly are, the era of precision cosmology 
we are now entering requires more thought to be given to methods for both detecting and exploiting departures from Gaussian behaviour. 

The pressing need for statistics appropriate to the analysis of non-linear stochastic processes also suggests a need to revisit some of 
the fundamental properties cosmologists usually assume when studying samples of the Universe. Gaussian random fields have many useful 
properties. It is straightforward to impose constraints that result in statistically homogeneous fields, for example. Perhaps more relevantly one 
can understand the conditions under which averages over a single spatial domain are well-defined, the constraint of sample-homogeneity. The 
conditions under which such fields can be ergodic are also well established. It is known that smoothing Gaussian fields preserves Gaussianity, 
and so on. These properties are all somewhat related, but not identical. Indeed, as we shall see in Section 4, looking at the corresponding 
properties of non-linear fields turns up some interesting results and delivers warnings to be careful. Exploring these properties is the first aim 
of this paper. 

Even if the primordial density fluctuations were indeed Gaussian, the later stages of gravitational clustering must induce some form 
of non-linearity. One particular way of looking at this issue is to study the behaviour of Fourier modes of the cosmological density field. If 
the hypothesis of primordial Gaussianity is correct then these modes began with random spatial phases. In the early stages of evolution, the 
plane-wave components of the density evolve independently like linear waves on the surface of deep water. As the structures grow in mass, 
they interact with other in non-linear ways, more like waves breaking in shallow water. These mode-mode interactions lead to the generation 
of coupled phases. While the Fourier phases of a Gaussian field contain no information (they are random), non-linearity generates non- 
random phases that contain much information about the spatial pattern of the fluctuations. Although the significance of phase information 
in cosmology is still not fully understood, there have been a number of attempts to gain quantitative insight into the behaviour of phases 
in gravitational systems. Ryden & Gramann (1991), Soda & Suto (1992) and Iain & Bertschinger (1998) concentrated on the evolution of 
phase shifts for individual modes using perturbation theory and numerical simulations. An alternative approach was adopted by Scherrer, 
Melott & Shandarin (1991), who developed a practical method for measuring the phase coupling in random fields that could be applied to 
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real data. Most recently Chiang & Coles (2000), Coles & Chiang (2000), Chiang (2001) and Chiang, Naselsky & Coles (2002) have explored 
the evolution of phase information in some detail. 

Despite this recent progress, there is still no clear understanding of how the behaviour of the Fourier phases manifests itself in more 
orthodox statistical descriptors. In particular there is much interest in the usefulness of the simplest possible generalisation of the (second- 
order) power-spectrum, i.e. the (third-order) bispectrum (Peebles 1980; Scoccimarro et al. 1998; Scoccimarro, Couchman & Frieman 1999; 
Verde et al. 2000; Verde et al. 2001; Verde et al. 2002). Since the bispectrum is identically zero for a Gaussian random field, it is generally 
accepted that the bispectrum encodes some form of phase information but it has never been elucidated exactly what form of correlation 
it measures. Further possible generalisations of the bispectrum are usually called polyspectra; they include the (fourth-order) trispectrum 
(Verde & Heavens 2001) or a related but simpler statistic called the second-spectrum (Stirling & Peacock 1996). Exploring the connection 
between polyspectra and non-linearly induced phase association is the second aim of this paper. 

We investigate both these issues by reference to a particular form of non-Gaussian field that serves both as a kind of phenomenological 
paradigm and as a reasonably realistic model of non-linear evolution from Gaussian initial conditions. The model involves a field which is 
generated by a simple quadratic transformation of a Gaussian distribution, hence the term quadratic non-linearity. Quadratic fields have been 
discussed before from a number of contexts (e.g. Coles & Barrow 1987; Moscardini et al. 1991; Falk, Rangarajan & Srednicki 1993; Luo & 
Schramm 1993; Luo 1994; Gangui et al. 1994; Koyoma, Soda & Taruya 1999; Peebles 1999a,b; Matarrese, Verde & Jimenez 2000; Verde et 
al. 2000; Verde et al. 2001; Komatsu & Spergel 2001; Shandarin 2002; Bartolo, Matarrese & Riotto 2002); for further discussion see Section 
3. Our motivation is therefore very similar to that of Coles & Jones (1991), which introduced the lognormal density field as an illustration of 
some of the consequences of a more extreme form of non-linearity involving an exponential transformation of the linear density field. 

The plan is as follows. In the following section we introduce some fundamental concepts underlying statistical cosmology, more-or-less 
from first principles. We do this in order to allow the reader to see explicitly what assumptions underlie standard statistical practise. In Section 
3 we then look at some of the contexts in which quadratic non-linearity may arise, either primordially or during the non-linear growth of 
structure from Gaussian fields. In Section 4 we revisit some of the basic properties used in Section 2 from the viewpoint of quadratic non- 
linearity and show how some basic implicit assumptions are violated. We then, in Section 5, explore how phase correlations arise in quadratic 
fields and relate these to higher-order statistics of quadratic fields. We discuss the lessons learned from this study in Section 6. 



2 BASIC STATISTICAL CONCEPTS 

We start by giving some general definitions of concepts which we will later use in relation to the particular case of cosmological density 
fields. In order to put our results in a clear context, we develop the basic statistical description of cosmological density fields; see also, e.g., 
Peebles (1980). 



2.1 Fourier Description 

We follow standard practice and consider a region of the Universe having volume Vu, for convenience assumed to be a cube of side L'^ Is, 
where Is is the maximum scale at which there is significant structure due to the perturbations. The region Vu can be thought of as a "fair 
sample" of the Universe if this is the case. It is possible to construct, formally, a "realisation" of the Universe by dividing it into cells of 
volume Vu with periodic boundary conditions at the faces of each cube. This device is often convenient, but in any case one often takes the 
limit Vu — » oo. Let us denote by p the mean density in a volume Vu and take p(x) to be the density at a point in this region specified by the 
position vector x with respect to some arbitrary origin. As usual, we define the fluctuation to be 

5(x) = [p(x) - p\/p. (I) 
We assume this to be expressible as a Fourier series: 

(5(x) = ^ ^ (5k cxp(ik ■ x) = ^ ^ (5k exp(— ik ■ x); (5k — / (5(x) exp(— ik ■ x)cix. (2) 

Vu ' - - 



k 



The Fourier coefficients (5k are complex quantities, (5k = j(5k| exp (i(/>k) with amplitude |(5k| and phase (jik. Conservation of mass in Vu 
implies 5k=o ~ and the reality of (5(x) requires 5^ — 5_k- 

If, instead of the volume Vu, we had chosen a different volume Vu the perturbation within the new volume would again be represented 
by a series of the form (^), but with different coefficients (5k. Now consider a (large) number TV of realisations of our periodic volume and 
label these realisations by Vui, Vu2, Vus, Vun- It is meaningful to consider the probability distribution ^(Jk) of the relevant coefficients 
5k from realisation to realisation across this ensemble. One typically assumes that the distribution is statistically homogeneous and isotropic, 
in order to satisfy the Cosmological Principle, and that the real and imaginary parts of 5k have a Gaussian distribution and are mutually 
independent, so that 

where w stands for either Re [(5k] or Im [(5k] and al — a\l2; a\ is the spectrum. This is the same as the assumption that the phases (^-^ 
are mutually independent and randomly distributed over the interval between <^ = and (\> = 27r. In this case the moduli of the Fourier 
amplitudes have a Rayleigh distribution: 
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P(|<5kUk)d|<5k|#k = %^exp(-M|L^^rf|5^|d0^. (4) 

Because of the assumption of statistical homogeneity and isotropy, the quantity P(5k) depends only on the modulus of the wavevector k 
and not on its direction. It is fairly simple to show that, if the Fourier quantities |(5k| have the Rayleigh distribution, then the probability 
distribution V{5) of 5 = (5(x) in real space is Gaussian, so that: 

where is the variance of the density field 5(x). This is a strict definition of Gaussianity. However, Gaussian statistics do not always require 
the distribution (Q) for the Fourier component amplitudes. According to its Fourier expansion, (5(x) is simply a sum over a large number of 
Fourier modes whose amplitudes are drawn from some distribution. If the phases of each of these modes are random, then the Central Limit 
Theorem will guarantee that the resulting superposition will be close to a Gaussian if the number of modes is large and the distribution of 
amplitudes has finite variance. Such fields are called weakly Gaussian. 



2.2 Covariance Functions & Probability Densities 

We now discuss the real-space statistical properties of spatial perturbations in p. The covariance function is defined in terms of the density 
fluctuation by 

. ([P(x)-p][p(x + r)^p]) ^ ^,(,),(, + r)>. (6) 

The angle brackets in this expression indicate two levels of averaging: first a volume average over a representative patch of the universe 
and second an average over different patches within the ensemble, in the manner of section section 2.1. Applying the Fourier machinery to 
equation one arrives at the Wiener-Khintchin theorem, relating the covariance to the spectral density function or power spectrum, P{k): 

C(r) = ^(15k|')exp(-ikT), (7) 

k 



C(r) = / P{k) exp(-ik • r)dk. (8) 



which, in passing to the limit Ki — ^ oo, becomes 
1 

(2^) 

Averaging equation over r gives 

(C(r)>r = ^ ^(l-^kl') f exp(-ik • r)dr = 0. (9) 
" k •' 

The function ^(r) is the two-point covariance function. In an analogous manner it is possible to define spatial covariance functions for 
N > 2 points. For example, the three-point covariance function is 

^ - p] [p(x + r) - p] [p(x + s) - p]) ^^^^ 

which gives 

C(r,s) = (5(x)5(x + r)5(x + s)>, (II) 

where the spatial average is taken over all the points x and over all directions of r and s such that |r — s| = t: in other words, over all points 
defining a triangle with sides r, s and t. The generalisation of ( p^ to A'^ > 3 is obvious. 

The covariance functions are related to the moments of the probability distributions of (5(x). If the fluctuations form a Gaussian random 
field then the N-variate distributions of the set Si = (5(x, ) are just multivariate Gaussians of the form 

The correlation matrix dj can be expressed in terms of the covariance function 

C^J = {S^5J) = C(r,,). (13) 

It is convenient to go a stage further and define the N-point connected covariance functions as the part of the average ((5i...(5jv) that is not 
expressible in terms of lower order functions e.g. 

{SiSiSa) = {Si)452Sa)c + (<52>c(<5i53)c + {5a}c{5iS2)c + (Si) c {52} 483)0 + {SiSiS-i)^ (14) 

where the connected parts are {5iS2Sa)c, {Si52)c, etc. Since (S) = by construction, {Si}c = (Si) — 0. Moreover, {5i52}c ~ {S1S2) and 
{(5i52(53)c ~ {51S2S3). The second and third order connected parts are simply the same as the covariance functions. Fourth and higher order 
quantities are different, however. The connected functions are just the multivariate generalisation of the cumulants km (Kendall & Stewart 
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1977). One of the most important properties of Gaussian fields is that all of their N-point connected covariances are zero beyond N=2, so 
that their statistical properties are fixed once the set of two-point covariances ( p^ is determined. All large-scale statistical properties are 
therefore determined by the asymptotic behaviour of f (r) as r — + oo. This simplifying property is not shared by non-Gaussian fields, a fact 
we shall explore in Section 4. 



3 QUADRATIC NON-LINEARITY 

In this section we discuss some of the circumstances wherein quadratic non-linearity may arise. At the outset we should admit that this study 
is primarily phenomenological and our intention is largely to use this as a model that displays some consequences of non-linearity. 



3.1 Phenomenology 

As far as we are aware, the first application of quadratic density fields in a cosmological setting was in Coles & Barrow (1987) who were 
studying the possible sample properties of non-Gaussian temperature fluctuations on the cosmic microwave background sky. They in fact 
studied a series of models called Xn models obtained via the transformation 

Y = Xl+Xl + ...Xl, (15) 

where n is the order and the Xi are independent Gaussian random fields with identical covariance functions. Similar models were also 
explored by Moscardini et al. (1991) using Y to model either the primordial density field or the primordial gravitation potential; see also 
(Koyoma, Soda & Taruya 1999; Verde et al. 2000; Matarrese, Verde & Jimenez 2000; Verde et al. 2001; Komatsu Sl Spergel 2001). The case 
n = 1 is the quadratic model of the present paper. There was no physical motivation for this as a model of CMB temperature fluctuations; it 
was used simply because one could calculate analytic results for such a field to compare with similar results for a Gaussian. Indeed the Xn 
random field has been used as a model for non-Gaussian phenomena in a wide range of fields, including surface physics and geology (e.g. 
Adler 1981). 



3.2 Inflation 

Since the early 1980s (e.g. Guth & Pi 1982) it has been commonly believed that the inflationary scenario of the very early Universe results 
in the imprint of primordial Gaussian fluctuations. Since then, and with the invention of increasingly complicated models of the inflationary 
process, it has become clearer that inflation can produce significant levels of non-Gaussianity. The simplest "slow-roll" models of inflation 
involving a single dominant scalar field do indeed produce Gaussian fluctuations, but including non-linear terms and back-reaction does 
produce some element of non-Gaussianity as do extra degrees of freedom (Salopek & Bond 1990; Salopek 1992; Falk, Rangarajan & 
Srednicki 1993; Gangui et al. 1994; Wang & Kamionkowski 2000; Bartolo, Matarrese & Riotto 2002). 

Typically the non-linear contributions arising during inflation manifest themselves as higher-order contributions to the effective Newto- 
nian gravitational potential, <I> i.e. 

$ = <^ + a(<^"-(02)) + ..., (16) 

where is a Gaussian field (not the phase) and a is a constant which is vanishingly small in most models. The term in {(fP') is needed 
to ensure $ has zero mean. Since the mean value of the Newtonian potential is not physically meaningful anyway this term is not really 
important. A more radical suggestion for an inflation-induced quadratic model is offered by Peebles (1999a,b). In this model the density field 
is given by 

p(x) = im>(x)^ (17) 
where i/; is a scalar field and m is an effective mass. We return to the statistical consequences of this particular model in Section 3. 



3.3 Gravitational Non-linearity 

In a simple perturbative model, the non-linear density contrast at a point r can be modelled by the relation 

5(x) = 5i(x) + e52(x) (18) 

where 5i(x) is a Gaussian random field, <52(x) is a quadratic random field derived by squaring 5\ and e is a small factor that controls the 
degree of non-linearity. To be precise we should also include a constant term in this expression in order to ensure that (5(x)) = 0, but this 
does not play any role in the following for reasons mentioned above so we ignore it. Using a constant e is not rigorous but at least qualitatively 
it shows how the lowest non-linear corrections come into play. For a detailed discussion of perturbation theory done properly see Bemardeau 
et al. (2002). 
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3.4 Bias 

Attempts to confront theories of cosmological structure formation witfi observations of galaxy clustering are complicated by the uncertain 
and possibly biased relationship between galaxies and the distribution of gravitating matter. A particular simple and useful way of modelling 
this relationship is through the idea of a local bias. In such models, the propensity of a galaxy to form at a point where the total (local) density 
of matter is p is taken to be some function /(p) (Coles 1993; Fry & Gaztanaga 1993). This boils down to a statement of the form 

5g(x) = /[5(x)], (19) 

where 5g is the density contrast inferred from galaxy counts or other clustering statistics. In the simplest local bias models, f{5) is a constant 
usually called b. Clearly a linear bias of this form simply scales the variance, covariance functions and power-spectra of the underlying field 
but has no effect on the detailed form of the statistical distribution. Models where the bias is non-linear (but still local) are useful as they 
subject constraints on the effect that the bias may have on galaxy clustering statistics, without making any particular assumption about the 
form of / (Coles 1993). Fry & Gaztanaga (1993) discussed the implications of bias with the form 

oo 

/(5)=^M", (20) 

in which the 6„ cannot all be chosen independently because the mean of 5g must again be zero. On scale where the density field is linear, one 
can therefore see that a non-linear bias with 62 7^ will result in quadratic contributions to 5g even if they do not contribute significantly to 
6. 



4 ASYMPTOTIC PROPERTIES 

In developing the statistical background in Section 2, especially the Fourier description of random fluctuating fields, we made a number of 
assumptions along the way that were necessary in order for the resulting descriptors to be well-defined. The terminology relating to these 
assumptions is often used very loosely in cosmology, at least partly because they bear a relatively simple relationship to each other under 
when the fields one is dealing with are Gaussian. However, in the general case of non-Gaussian fields many subtleties arise relating to the 
presence of higher-order statistical correlations. It is especially important in the current era of high-precision developments in statistical 
cosmology also to be precise about the foundations. 

In the following we discuss some of the large-scale properties of random fields in a formal fashion, with particular reference to the 
quadratic model. The behaviour of covariance functions on large-scales will turn out to be very important so, to take the simplest example, 
consider the Peebles model from Section 3.2. In this case we basically have a quadratic density field 5 = ijP where 1/; is assumed a statistically 
homogeneous random field. Let us suppose that 7/; has a well-behaved covariance function r(r) and that the covariance function of 5 is, as 
usual, if (r). It is trivial to show that that 

av) = V\r) (21) 

so that f(r) must be positive for all r. In the more general case of a field of the form 5 = tp -\- aip^, such as the examples given in equations 
( |l5| ) & (^), the resulting covariance function has a behaviour of the form 

ar) = T{r)+a^r\r). (22) 

In this case the covariance function of the resulting field would contain terms of order r(r), the corresponding covariance function of the 
underlying Gaussian field. If r(r) < on some scale then as long as a is small, the resulting ^(r) need not be positive in this case. 



4.1 Statistical Homogeneity 

The formal definition of strict statistical homogeneity for a random field (also called stationarity) is that the set of finite-dimensional joint 
probability distributions, which we called 'Pn{5i, 5n) in Section 2.3, must be invariant under spatial translations, i.e. 

Pjv(5(xi), 5(xjv)) = Pjv(5(xi + x), 5(xjv + x)) (23) 

for any x. This must be true for all orders TV. For a Gaussian random field in which the form of ■Pjv(5i, (5jv) is given by equation (p^, 
necessary and sufficient conditions for 5(x) to be strictly homogeneous is that the covariance function (5(xi)5(x2)) is a function of xi — X2 
only (Adler 1981). Statistical isotropy can be added by requiring rotation-invariance. One can define weaker versions of homogeneity and 
isotropy according to which only the moments of the distribution be translation invariant. For example, second-order homogeneity and 
isotropy (all that is required for the analysis of power-spectra or two-point covariance functions) basically means that the function ^(r) does 
not depend on either the origin or the direction of r, but only on its modulus. Since the properties of a Gaussian random field depend only on 
second-order properties, this weaker condition is sufficient to require condition ( |2^ in this case. 

This does not mean that any function satisfying translation and rotation invariance is necessarily the covariance function of a homoge- 
neous random field (in either the strict or second-order sense). For one thing the power spectrum must be positive (or zero) for all k, which 
places a constraint on the shape of any possible f (r) - which must be convex. The result (S) also implies that 



Jo 



^dr' = (24) 
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for such fields. A perfectly homogeneous distribution would have P{k) = and ^(r) would be identically zero for all r. Note, however, that 
it is possible for fields obeying either ( pl[ ) or ( p2[ ) to be statistically homogeneous. 

4.2 Sample Homogeneity 

Statistical homogeneity plays a vital role in both the analysis of clustering and the formal development of the theory of cosmological 
perturbation growth. Unfortunately the use of the word "homogeneity" in this context leads to a confusion regarding the more fundamental use 
of this word in cosmology. Standard cosmologies are based on the Cosmological Principle, which requires our Universe to be homogeneous 
and isotropic on large scales. More loosely, it needs to be sufficiently homogeneous and isotropic that the Robertson- Walker metric and the 
Friedmann equations furnish an adequate approximation to the evolution of the Universe. Statistical homogeneity as described above is a 
much weaker requirement than the requirement that one realisation from a probability ensemble (our Universe) has asymptotically small 
fluctuations when smoothed on a sufficiently large scale. In fact, the analogous relation to in the case of sample homogeneity is far 
stronger: 

lim / i{r')r'^dr' = (25) 







as this requires the real fluctuations in density to be asymptotically small within a single realisation. Notice that the requirement for sample 
homogeneity is such that in general the covariance function must changes sign, from positive at the origin, where ^(0) = > 0, to negative 
at some r to make the overall integral ( p4| ) converge in the correct way. 

It is clear from this discussion that statistical homogeneity does not require sample homogeneity. Revisiting the quadratic model now 
reveals another interesting point: the Peebles model ( pi] ) can not be sample homogeneous, even if it is statistically homogeneous. If we want 
5 to have a covariance function matching observations, say ^{r) ~ r^^, then the underlying Gaussian field must have r(r) ~ r^^ which 
violates the constraint for it to be sample homogeneous. 

This model behaves in a similar way to fractal models of the type discussed, for example, by Coleman & Pietronero (1992). Mention 
of fractal models tends to send mainstream cosmologists screaming into the hills, but the lack of sample homogeneity they describe need 
not be damaging to standard methods. To give a prominent illustration, consider the behaviour of the gravitational potential field defined by 
a Gaussian random field with the Harrison-Zel'dovich spectrum. In such a case, the fluctuations will always be small (of order 10^^ to be 
consistent with observations) but they are independent of scale, and thus there is never a scale at which sample homogeneity is reached. It is 
not particularly important for the purposes of galaxy clustering studies that the universe obeys the property of sample homogeneity. What is 
more important is estimates of statistical properties obtained from different samples vary with respect to the ensemble-averaged property in 
a fashion which is under control for large samples. This does not require asymptotic convergence to homogeneity. 

Finally, note that the general quadratic model ( p^ can be sample homogeneous if V{r) obeys condition ( ^sj ) and a is sufficiently small. 
Perturbative corrections, such as those described in Section 3.3, do not therefore necessarily induce sample inhomogeneity. 



4.3 Ergodicity and Fair Samples 

We have already introduced the idea of a "fair sample hypothesis", which is basically that averages over finite patches of the Universe can be 
treated as averages over some probability ensemble. Peebles (1980), for example, gives the definition of a fair sample hypothesis in a number 
of ways. First, he states 

"..the fair sample hypothesis is taken to mean that the universe is statistically homogeneous and isotropic." 
Later we find 

"Samples from well separated spots are uncorrelated, and the collection of such samples is a statistical ensemble generated by many independent applica- 
tions..." 

This second definition is close to the one we used in Section 2, but it is clear that it is stronger than the first one. Related to the fair sample 
hypothesis, but not identical to it, is the so-called ergodic property, which is that averages over an infinite domain within a single realization 
can be treated as averages over the probability ensemble. The second definition of a fair sample is a stronger statement than the ergodic 
property, since it involves the properties of finite patches rather than an infinite domain within a single realisation from the probability 
ensemble. 

Ergodic properties are extremely difficult to prove, but results do exist for Gaussian random fields (Adler 1981). Intriguingly, in this case 
the result is extremely simple. The necessary and sufficient condition for a statistically homogeneous Gaussian random field to be ergodic is 
that its power-spectrum (defined above) should be continuous. Continuity of the power-spectrum leads, by standard Fourier analysis, to the 
result that 

lim \ f K(r')]V'^dr' = 0. (26) 
Jo 

This requires the covariance function to be decreasing. In fact, any statistically homogeneous Gaussian field will be ergodic if ^(r) ^ as 
r — > oo. Notice then that a Gaussian random field can be ergodic without being sample homogeneous. 

A general form of this ergodic theorem does not exist for arbitrary non-Gaussian random fields, but fortunately this does not matter. 
What is needed for statistical cosmology is not an ergodic property but something closer to a version of the fair sample hypothesis. The 
ergodic property does not require the fair-sample property, as it involves averages over infinite domains of a single realisation. The fair 
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sample hypothesis does not require ergodicity, either. From this we can conclude that the ergodic property is irrelevant and use of this term 
should be avoided. 

There is, however, one particularly neat relationship between ergodicity and statistical homogeneity for the Peebles model. Notice if we 
take 5 (X ip'^ and require to be a Gaussian random field with covariance function ^(r) then the covariance function of S is just (r), exactly 
the form that appears in equation (p6|). If we take ip to be an ergodic Gaussian random field then this guarantees the resulting quadratic field 
must be at least second-order statistically homogeneous. 



4.4 Asymptotic Independence and Smoothing 

The considerations we have discussed above generally lead to requirements that the correlations between points become small as the separa- 
tion between the points grows large. For a Gaussian field, if ^(r) — > as r — * cx3 then the probability distributions tend to an asymptotically 
independent form. This must be the case because such a field contains only second-order correlations. For example, in the limit that the corre- 
lation matrix dj becomes diagonal, the 2-point Gaussian probability density 'P2(<5i, (52) 'Pi{Si)'Pi(52), consistent with the requirement 
of independence i.e. P{A, B) = P{A)P(B). Similar results will hold for higher order A'^-point distributions. This result means that for 
Gaussian fields absence of (second-order) correlation, i.e. 



means independence. Lack of correlation only requires full independence for Gaussian fields. Independence always implies lack of correlation 
whether the field is Gaussian or not. 

We can see then that even though a non-Gaussian field, such as a quadratic model, may be uncorrelated on large scales, consistent with 
the requirements above, this does not necessarily mean that points are asymptotically independent. 

The reason for discussing this is that it is very relevant to what happens to a random field as it is smoothed on successively larger scales. 
This smoothing is equivalent to filtering the field with a low pass filter. The filtered field, (5(x; Rf), may be obtained by convolution of the 
"raw" density field with some function F having a characteristic scale 7?/: 



The filter F has the following properties: F = constant ~ RJ^ if |x - x'| < Rj, F ~ if |x - x'| > i?/, / F{y; Rf)dy = 1. 

If the underlying density field is Gaussian then the filtered field will also be Gaussian. This is a result of the fact that filtering essentially 
constructs a weighted average of the underlying field and any sum of Gaussian variates is itself a Gaussian variate (e.g. Kendall & Stuart 
1977). According to the Central Limit Theorem, the sum of a large number of independent variates drawn from a distribution with finite 
variance also tends to a Gaussian distribution. One would imagine, therefore, that if distant points were asymptotically independent then 
the effect of filtering on a non-Gaussian field is to "gaussianize" it. In fact this is assumed in standard statistical cosmology. We know the 
small-scale distribution is non-Gaussian, but averaging over sufficiently large smoothing windows is assumed to recover the linear field, 
or something close to it. But for a general non-Gaussian field how quickly do points have to become independent in order for filters to 
Gaussianize the distribution? 

The answer to this question is given by Fan & Bardeen (1995): it depends on a function called the Rosenblatt dependence rate which 
governs the rate at which the maximum value of \P(AB) — P{A)P{B) \ tends to zero at large separations (A and B are any combination 
of values of Si in different locations). This is quite a technical issue, and we leave the details to Fan & Bardeen (1995). On the other hand 
when the field in question is a local transformation of a Gaussian random field (such as in the quadratic model) then there is a simple result 
that the covariance function of the underlying field must fall at least as quickly as as r oo. This is sometimes referred to as the 
pseudo-Markov condition (Adler 1981). If this criterion is satisfied then all local transformations of the underlying field satisfy the mixing 
rate condition and consequently become Gaussian if smoothed on sufficiently large scales. 

Let us look at this issue in the light of the Peebles model. Peebles (1999b) shows that this model is fully self-similar. When smoothed 
on larger and larger scales the distribution function does not tend to a Gaussian but retains the same (x^) form. 

At first sight this looks extremely surprising. Suppose we imagine a simple version of a random field built upon discrete cells of 
some size R. Suppose the field were uniform within each cell but that adjacent cells were generated independently from the distribution. 
Filtering on a scale Rf in this case simply corresponds to adding neighbouring (uncorrelated) cells, producing something like a sum of 
N = {Rf /RY independent values from a quadratic model. This would give rise to a resulting field of the form (|l 5[). As Rf becomes larger, 
A'^ increases and the resulting distribution changes to a distribution of Xn of larger and larger n. I=t is well known that, as n — > oo, the 
distribution of ¥„ is asymptotically close to the Gaussian as expected from the central limit theorem. 

In the case of the Peebles model, however, the mixing rate condition is not satisfied. Although distant points are asymptotically indepen- 
dent, the rate at which they tend to independence is not sufficient to produce a Gaussian distribution. Regardless of the scale of smoothing, 
as long as the covariance function is chosen to be scale-free, the density field retains a distribution having the same shape. This demonstrates 
that this model is, in fact, a kind of fractal as we discussed above. 

Notice that in the more normal case where we take the quadratic contribution to represent only the non-linear effect on initially Gaussian 
fluctuations then it is guaranteed to satisfy the mixing rate condition as long as the Gaussian field does that generates it. It is therefore justified 
to assume that the model described in Section 3.3 does become Gaussian when smoothed on large scales. 



{X1X2) = (Xi>(X2>, 



(27) 




(28) 
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5 QUADRATIC PHASE COUPLING 

In section |^ we pointed out that a convenient definition of a Gaussian field could be made in terms of its Fourier phases, which should by 
independent and uniformly distributed on the interval [0, 2-k\. A breakdown of these conditions, such as the correlation of phases of different 
wavemodes, is a signature that the field has become non-Gaussian. In terms of cosmic large-scale structure formation, non-Gaussian evolution 
of the density field is symptomatic of the onset of non-linearity in the gravitational collapse process, suggesting that phase evolution and non- 
linear evolution are closely linked. A relatively simple picture emerges for models where the primordial density fluctuations are Gaussian and 
the initial phase distribution is uniform. When perturbations remain small evolution proceeds linearly, individual modes grow independently 
and the original random phase distribution is preserved. However, as perturbations grow large their evolution becomes non-linear and Fourier 
modes of different wavenumber begin to couple together This gives rise to phase association and consequently to non-Gaussianity. It is clear 
that phase associations of this type should be related in some way to the existence of the higher order connected covariance functions, which 
are traditionally associated with non-linearity and are non-zero only for non-Gaussian fields. In this sections such a relationship is explored 
in detail using an analytical model for the non-linearly evolving density fluctuation field. Phase correlations of a particular form are identified 
and their connection to the covariance functions is established. 



5.1 A simple non-linear model 

We adopt the simple perturbative expansion of equation ( p^ in order to model the non-linear evolution of the density field. Although the 
equivalent transformation in formal Eulerian perturbation theory is a good deal more complicated, the kind of phase associations that we 
will deal with here are precisely the same in either case. In terms of the Fourier modes, in the continuum limit, we have for the first order 
Gaussian term 

5i(x) = j d^fc |5k| exp exp [ik • x] (29) 
and for the second-order perturbation 

52(x) = [5i(x)]' = j d^fcd^'fc' |(5k||5k'|exp[i(<?ik + <?ik')] exp[i(k + k') -r]. (30) 

The quadratic field, ^2, illustrates the idea of mode coupling associated with non-linear evolution. The non-linear field depends on a specific 
harmonic relationship between the wavenumber and phase of the modes at k and k'. This relationship between the phases in the non-linear 
field, i.e. 

0k + ^k' = ^k+k', (31) 

where the RHS represents the phase of the non-linear field, is termed quadratic phase coupling. 



5.2 The two-point covariance function 

The two-point covariance function can be calculated using the definitions of section ^ namely 

C(r) = (5(x)5(x + r)>. (32) 
Substituting the non-linear transform for (5(x) (equation into this expression gives four terms 

^(r) = {5i(x)5i(x + r)) + e(5i(x)52(x + r)) + e(52(x)5i(x + r)> + e'{52(x)52(x + r)>. (33) 

The first of these terms is the linear contribution to the covariance function whereas the remaining three give the non-linear corrections. We 
shall focus on the lowest order term for now. 

As we outlined in Section 2, the angle brackets () in these expressions are expectation values, formally denoting an average over the 
probability distribution of 5(x). Under the fair sample hypothesis we replace the expectation values in equation ( p2| ) with averages over a 
selection of independent volumes so that () — > ()voi real- ^^^^ average is simply a volume integral over a sufficiently large patch of the 
universe. The second average is over various realisations of the 5k and ipk in the different patches. Applying these rules to the first term of 
equation ( |33[ ) and performing the volume integration gives 
r 

= / d?kd^k' (|5k||<5k' I exp [j(0k + 0k')])real '^^(k + k') exp [ik' • s], 
where 5d is the Dirac delta function. The above expression is simplified given the reality condition 

5k = 51k, (35) 
from which it is evident that the phases obey 

0k + 0-k = mod[27r]. (36) 
Integrating equation ( p4[ ) one therefore finds that 

^11 (r) = / d^fc (|(5k|^)real exp [-ik ■ s]. (37) 



(34) 
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so that the final result is independent of the phases. Indeed this is just the Fourier transform relation between the two-point covariance 
function and the power spectrum we derived in section 2.1. 



5.3 The three-point covariance function 

Using the same arguments outlined above it is possible to calculate the 3-point connected covariance function, which is defined as 

C(r, s) = (<5(x)<5(x + r)<5(x + s)>c. (38) 

Making the non-linear transform of equation one finds the following contributions 

C(r,s) = (5i(x)5i(x + r)5i(x + s)>c -He{5i(x)5i(x + r)52(x + s))c +perms(121,211) 
+e'(5i(x)52(x + r)52(x + s)), + perms(212, 221) 

+e'(52(x)52(x + r)5(x + s)),. (39) 

Again we consider first the lowest order term. Expanding in terms of the Fourier modes and once again replacing averages as prescribed by 
the fair sample hypothesis gives 

(^iii(r,s) = J d^kd^k' d^k" (|5kl l^k' 1 1'^k" 1 ^xp [i((^k + i/'k' + 0k")])real ^^(k + k' + k") exp [ik' ■ r] exp [ik" ■ s]. (40) 

Recall that 5\ is a Gaussian field so that ijik, <^k' and ijik" are independent and uniformly random on the interval [0, 2ti\. Upon integration 
over one of the wavevectors the phase terms is modified so that its argument contains the sum (</ik + <^k' + <^-k-k")' or a permutation 
thereof. Whereas the reality condition of equation (^sj) implies a relationship between phases of anti-parallel wavevectors, no such conditions 
hold for modes linked by the triangular constraint imposed by the Dirac delta function. In other words, except for serendipity, 

<^k + <^k' + '/>-k-k" / 0. (41) 

In fact due to the circularity of phases, the resulting sum is still just uniformly random on the interval [0, 2ti\ if the phases are random. 
Upon averaging over sufficient realisations, the phase term will therefore cancel to zero so that the lowest order contribution to the 3-point 
function vanishes, i.e. ^iii(r, s) = 0. This is not a new result, but it does explicitly illustrate how the vanishing of the three-point connected 
covariance function arises in terms of the Fourier phases. 

Next we consider the first non-linear contribution to the 3-point function given by 

Cii2(r,s) = e(<5i(x)5i(x + r)<52(x + s)), (42) 

or one of its permutations. In this case one of the arguments in the average is the field 52(x), which exhibits quadratic phase coupling of the 
form (^T|). Expanding this term to the point of equation ( po[ ) using the definition ( po[ ) one obtains 

Cii2(r, s) = J d^k<fk' <fk" d^k'" {|(5k||(5k' lli^k" lli^k'" I exp [i(<?!)k + (/-k' + 0k" + <^k"')])real 

x5D(k + k' + k" + k'") exp [ik' ■ r] exp [i(k" + k'") ■ s]. (43) 

Once again the Dirac delta function imposes a general constraint upon the configuration of wavevectors. Integrating over one of the k gives 
k'" = — k — k' — k" for example, so that the wavevectors must form a closed loop. This general constraint however, does not specify a 
precise shape of loop, instead the remaining integrals run over all of the different possibilities. At this point we may constrain the problem 
more tightly by noting that most combinations of the k will contribute zero to (112. This is because of the circularity property of the phases 
and equation (pl|). Indeed, the only nonzero contributions arise where we are able to apply the phase relation obtained from the reality 
constraint, equation (^^. In other words the properties of the phases dictate that the wavevectors must align in anti-parallel pairs: k = — k', 
k" = -k'" and so forth. 

There is a final constraint that must be imposed upon the k if is the connected 3-point covariance function. In a graph theoretic sense, 
the general (unconnected) A''-point function (5;^ (xi)^;^ (x2).-.5ijv {^n)) can be represented geometrically by a sum of tree diagrams. Each 
diagram consists of A'^ nodes of order h, representing the 5i. (xi), and a number of linking lines denoting their correlations; see Fry (1984) 
or Bemardeau (1992) for more detailed accounts. Every node is made up of h internal points, which represent a factor 5k = |5k| exp (i<^k) 
in the Fourier expansion. According to the rules for constructing diagrams, linking lines may join one internal point to a single other, either 
within the same node or in an external node (see Figure 1). The connected covariance functions are represented specifically by the subset of 
diagrams for which every node is linked to at least one other, leaving none completely isolated (Figure I; right). This constraint implies that 
certain pairings of wavevectors do not contribute to the connected covariance function (Figure 2). 

The above constraints may be inserted into equation by re- writing the Dirac delta function as a product over Delta functions of two 
arguments, appropriately normalised. There are only two allowed combinations of wavevectors so we have 

fo(k + k' + k" + k'") -> T^[5D(k + k")fo(k"' + k'") + fc(k + k"')fc(k' + k")]. (44) 

2 Vu 

Integrating over two of the k and using equation ( p6| ) eliminates the phase terms and leaves the final result 

Cii2(r,s) = :^ j d\d\' (|5k|'|<5k'r)reaiexp[ik'T]exp[-i(k + k')-s]. (45) 
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Figure 1. Figure 2. 

Figure 1. Two of the diagrammatic representations of the 3-point function {&\ (x)(5i (x + r)52 (x + s)) . Internal points (black dots) represent factors of 5^ 
in the Fourier expansion, which are joined up in different configurations by the black linking lines. The diagram on the right is connected, in the sense that all 
nodes (blue circles) are joined together in a tree. The diagram on the left does not contribute to the connected function since it has an isolated node. 

Figure 2. Possible wavevector configurations for the diagrams in figure 1 constructed under the constraints imposed by the Fourier phases. On the left the 
wavevectors associated with the single 2nd order node form an anti-parallel pair, as do those from the two 1st order nodes. This configuration does not 
contribute to the connected covariance function. The diagram on the right is that for the connected tree in fig. 1 . The wavevectors associated with the 2nd order 
node align with each of the wavevectors from the 1st order nodes. 



The existence of this quantity has therefore been shown to depend on the quadratic phase coupling of Fourier modes. The relationship 
between modes and the interpretation of the tree diagrams is also dictated by the properties of the phases. 

One may apply the same rules to the higher order terms in equation (BSl). It is immediately clear that the C,i22 terms are zero because 
there is no way to eliminate the phase term exp [j((^k + 4>'k' + + <^k"' + 0k"")l' ^ consequence of the property equation (|4l|). Dia- 
grammatically this corresponds to an unpaired internal point within one of the nodes of the tree. The final, highest order contribution to the 
3-point function is found to be 

C222(r,s) = i2 j (fkd?k'd'k" (|5k|'|<5k'|'l'5k"|'>reaiexp[i(k-k')T]exp[i(k'-k")-s], (46) 
where the phase and geometric constraints allow 12 possible combinations of wavevectors. 



5.4 Cubic non-linearity and higher order 

The above ideas extend simply to higher order where the non-linear field is represented by a perturbation series that does not truncate at the 
quadratic term. At the next highest order for example, the series includes ^3 = 5i, which introduces a cubic phase coupling in the Fourier 
expansion. Although quadratic phase coupling is essential as a minimum requirement for the three point covariance function, cubic phase 
coupling is not the minimum requirement for the next highest order covariance function. Indeed, quadratic coupling is sufficient to provide 
contributions to all of the n-point covariance functions due to the way the phases dictate that wave-vectors must arrange themselves into 
antiparallel pairs. For the 4-point covariance function the cubic term allows for a different diagrammatic representation: a star as opposed to a 
snake topology. However, in terms of the constituent wave-vectors, loops (as in Figure 2) contributing to the star topologies are a symmetric 
subset of those contributing to the snake topologies. 



5.5 Power-spectrum and Bispectrum 

The formal development of the relationship between covariance functions and power-spectra developed above suggests the usefulness of 
higher-order versions of P{k). It is clear from the arguments of Section 5.2 that a more convenient notation for the power-spectrum than that 
introduced in section 2. 1 is 

(<5k5k'> = (27r)='P(fc)fc(k + k'). (47) 

The connection between phases and higher-order covariance functions obtained in Section 5.3 also suggests defining higher-order polyspectra 
of the form 

(5k5k' . . . <5k(") ) = (2^)'Pn(k, k', . . . k("))fc (k + k' + . . . k("') (48) 

where the occurrence of the delta-function in this expression arises from a generalisation of the reality constraint given in equation (p6[); see, 
e.g., Peebles (1980). Conventionally the version of this with n = 3 produces the bispectrum, usually called B(k, k', k") which has found 
much effective use in recent studies of large-scale structure (Peebles 1980; Scoccimarro et al. 1998; Scoccimarro, Couchman & Frieman 
1999; Verde et al. 2000; Verde et al. 2001; Verde et al. 2002). It is straightforward to show that the bispectrum is the Fourier-transform of the 
(reduced) three-point covariance function by following similar arguments as in Section 5.2; see, e.g., Peebles (1980). 



© 0000 RAS, MNRAS 000, 000-000 



Quadratic Density Fields 1 1 



Note that the delta-function constraint requires the bispectrum to be zero except for fc-vectors (k, k', k") that form a triangle in 
fc-space. From Section 5.3 it is clear that the bispectrum can only be non-zero when there is a definite relationship between the phases 
accompanying the modes whose wave-vectors form a triangle. Moreover the pattern of phase association necessary to produce a real and 
non-zero bispectrum is precisely that which is generated by quadratic phase association. This shows, in terms of phases, why it is that the 
leading order contributions to the bispectrum emerge from second-order fluctuations of a Gaussian random field. The bispectrum measures 
quadratic phase coupling. 

Three-point phase correlations have another interesting property. While the bispectrum is usually taken to be an ensemble-averaged 
quantity, as defined in equation (|4^, it is interesting to consider products of terms (5k<5k' ^k" obtained from an individual realisation. Ac- 
cording to the fair sample hypothesis discussed above we would hope appropriate averages of such quantities would yield an estimate of the 
bispectrum. Note that 

5k5k'5k" = 5k5k"5-k-k' = 5k5k'5k+k' = /3(k,k'), (49) 
using the requirement (^) and the triangular constraint we discussed above. Each /3(k, k') will carry its own phase, say <^k,k'. which obeys 
<^k,k' = 4>^ + <Pk' - ^k+k'- (50) 

It is evident from this that it is possible to recover the complete set of phases (l>k from the bispectral phases 0k. k' , up to a constant phase 
offset corresponding to a global translation of the entire structure (Chiang & Coles 2000). This furnishes a conceptually simple method 
of recovering missing or contaminated phase information in a consistent way, an idea which has been exploited, for example, in speckle 
interferometry (Lohmann, Weigelt & Wirnitzer 1983). In the case of quadratic phase coupling, described by equation (^l|), the left-hand-side 



;try (Lo 
i(@is 



of equation (pOj) is identically zero leading to a particularly simple approach to this problem, which we shall explore in future work. 



6 DISCUSSION 

In this paper we have addressed two main issues, using the quadratic model as an illustrative example. First we showed explicitly how this 
non-Gaussian model has properties that contradict standard folklore based on the assumption of Gaussian fluctuations. We used this model to 
distinguish carefully between various inter-related concepts such as sample homogeneity, statistical homogeneity, asymptotic independence, 
ergodicity, and so on. We showed the conditions under which each of these is relevant and deployed the quadratic model for particular 
examples in which they are violated. 

We then used the quadratic model to show, for the first time, how phase association arises in non-linear processes which has exactly the 
correct form to generate non-zero bispectra and three-point covariance functions. The magnitude of these statistical descriptors is of course 
related to the magnitude of the Fourier modes, but the factor that determines whether they are zero or non-zero is the arrangement of the 
phases of these modes. 

The connection between polyspectra and phase information is an important one and it opens up many lines of future research, such as 
how phase correlations relate to redshift distortion and bias. Also, we assumed throughout this study that we could straightforwardly take 
averages over a large spatial domain to be equal to ensemble averages. Using small volumes of course leads to sampling uncertainties which 
are quite straightforward to deal with in the case of the power-spectra but more problematic for higher-order spectra like the bispectrum. 
Understanding the fluctuations about ensemble averages in terms of phases could also lead to important insights. 
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